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Abstract 

This paper presents some convergence theory for nonlinear Krylov subspace methods. The 
basic idea of these methods, which have been described by the authors in an earlier paper, is 
to use variants of Newton’s iteration in conjunction with a Krylov subspace method for solving 
the Jacobian linear systems. These methods are variants of inexact Newton methods where the 
approximate Newton direction is taken from a subspace of small dimension. The main focus of 
this paper is to analyze these methods when they are combined with global strategies such as 
linesearch techniques and model trust region algorithms. Most of the convergence results are 
formulated for projection onto general subspaces rather than just Krylov subspaces. 
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1 Introduction 


In a previous paper [5] we have proposed several basic methods based upon the idea of employing a 
Newton iteration in which the Jacobian equations are solved approximately by a Krylov subspace 
method. Several theoretical issues raised in [5] were left unanswered. The purpose of this paper is 
to fill this gap by laying down the theoretical foundation of nonlinear Krylov subspace methods and 
by providing convergence results for them. In fact we will not limit ourselves to Krylov subspace 
methods. Rather, we discuss inexact Newton methods based on general projection techniques. 

When defining algorithms for solving nonlinear systems of equations there are two possible 
options. First, one can use one of the globally convergent modifications of Newton’s iteration [11]. 
The linear systems that arise in the course of the Newton iteration can be solved by either a direct 
solver or they may be solved approximately by an iterative method. The class of methods based on 
the latter approach is a particular case of inexact Newton methods and several such methods were 
considered in [1, 3, 4, 5]. Newton’s method is essentially a linearization procedure. The mapping 
F is locally approximated by a linear function and the resulting linear equations are solved to yield 
the next point. The second approach to solving nonlinear equations does not rely on linearization. 
Thus, fixed point iterations are inherently nonlinear as are descent methods with accurate line 
searches. Another well-known example is that of the nonlinear conjugate gradient iteration. 

We restrict our attention here to the first approach. In particular, we use linear Krylov methods 
to solve approximately the Newton equations. The Krylov methods considered are Amoldi’s Method 
[24], and the Generalized Minimum Residual Method (GMRES) [25]. In general, these methods have 
the virtue of requiring virtually no matrix storage, and as such have a distinct advantage over direct 
methods. 

To be more specific, consider finding a solution u* of the nonlinear system of equations 

F(u) = 0, (1.1) 

where F is a nonlinear function from R^ to R^. Newton’s method applied to (1.1) results in the 
iteration 

1. Set uq = an initial guess. 

2. For n = 0, 1, 2, • • • until convergence do: 

Solve J(u n )6 n = -F^), (1.2) 

Set u n+ i = u n + tf n , 

where J(u n ) = F f (u n ) is the system Jacobian. For the problems under consideration, N is large, 
and as a result the so-called inexact Newton methods [9] solve (1.2) only approximately. 

One of the main advantages of the Krylov methods is that only the action of the Jacobian 
matrix J times a vector v is required, not J explicitly. In the current setting, this action can be 
well approximated by a difference quotient of the form 

(7 

where u is an approximation to a solution of (1.1), and <r is a scalar. Here, we will address the 
convergence behavior of the above algorithms when combined with a global linesearch backtracking 
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strategy or model trust region approach. We should emphasize that our convergence results are not 
restricted to the use of Krylov subspace methods when solving (1.2). Our theory is formulated in 
terms of projection techniques wherein the approximation to the linear system (1.2) is taken from 
a small dimensional subspace. 

Inexact Newton algorithms have been studied by several authors in recent years, see for example 
the references in [1] and the recent report [6]. Several authors have considered using Krylov methods 
inside a Newton iteration in the context of systems of ordinary differential equations [3, 4, 7, 14]. 
Steihaug [27] and O’Leary [20] have used the Conjugate Gradient (CG) method in the unconstrained 
optimization of a real- valued function of several variables. Nash [18, 19] has also used a Newton-CG 
algorithm in unconstrained optimization. Wigton et al. [29], and more recently Kerkhoven and 
Saad [16] have accelerated nonlinear fixed point iterations of the form «n+i = M(u n ) by applying 
this approach to solving the nonlinear system of equations u—M{u) = 0. Note that as was observed 
by Chan and Jackson [7], the new system of equations u— M(u) = 0 can be viewed as a nonlinearly 
preconditioned version of the original system of equations. 

In Section 2, we review inexact Newton algorithms, and present versions of the Newton-Amoldi 
and Newton- GMRES methods. In Section 3 we give a convergence theory for inexact Newton 
methods when combined with a linesearch backtracking global strategy, and then in Section 4 
we present a convergence theory for inexact Newton methods combined with model trust region 
strategies. In Section 5, we discuss applications of the basic results in the previous two sections to 
the Newton-Krylov methods, and then make some concluding remarks in Section 6. 
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2 Newton-Krylov methods 

In this section we review some of the basic ideas of inexact Newton methods and Newton-Krylov 
algorithms. We begin with a discussion of the relevant results from Dembo, Eisenstat and Steihaug 
[9] on inexact Newton methods, and then present the two inexact Newton methods we considered 
in [5], namely the Newton-Amoldi and Newton-GMRES methods. Note that a Newton-Krylov 
method is one example of an inexact Newton algorithm. 

2.1 Inexact Newton methods 

Prom [9], an inexact Newton method for (1.1) has the following general form: 

1. Choose tto an initial guess for it,. 

2. For n = 0, 1, • ■ ■ until convergence, do: 

• Choose rj n £ [0, 1). 

• Find (in some unspecified manner) a vector 6 n satisfying 

•J(«n)tfn = -F(lin) + r n , with < Vn (2.1) 

• Set u n+1 = u n + 8 n . 

The residual r n represents the amount by which S n fails to satisfy the Newton equation (1.2). It is 
not generally known in advance, being the result of some inner algorithm which produces only an 
approximate solution to (1.2) (e.g., an iterative method). The forcing sequence T] n £ (0, 1) is used 
to control the level of accuracy. Also, |j • || represents any norm on It N . 

We will make the following assumptions on F: 

{ There exists a u, £ TL N with F(ti,) = 0. 

F is C 1 in a neighborhood of u,. (2*2) 

J(u,) = F'(u ,) is nonsingular. 

The next theorem is shown in [9]: 

Theorem 2.1 Assume that F satisfies (2.2), and that T) n < T )™** < t < 1. There exists e > 0 such 
that i/ 1 1 Uo — xt* | [ < e, then the sequence of inexact newton iterates {ti n } converges to u ,. Moreover, 
the convergence is linear in the sense that 

||®n+I — u *l|* — ^|| u n — ***11*5 
where ||y||* = ||7(u*)y||. If in addition, 

T) n -> 0, (2.3) 

then the sequence {u n } converges to u t superlinearly. Also, if F' is Lipschxtz continuous near it, 
and T) n = 0(||.F(u n )||), then the convergence is quadratic. 
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In the above theorem, || • || again represents any norm on R . 

For the case when N is large, a 6 n satisfying the residual condition (2.1) is often obtained 
by using an iterative procedure for the linear system. In [5], we considered using the Araoldi and 
GMR.ES algorithms for nonsymmetric linear systems to obtain £ n ’s satisfying the residual condition 
(2.1). For the convergence theory presented in this paper, the actual method which produces a S n 
satisfying (2.1) will be left unspecified. All that will be required is the existence of such a 8 n . This 
is easily guaranteed by assuming that J n is nonsingular for all n. 

2.2 Newton-Arnoldi and Newton-GMRES 

At each iteration of the inexact Newton algorithm, we must obtain an approximate solution of the 
linear system (1.2) which we rewrite as 

JS = -F, (2.4) 

where F and its Jacobian J are evaluated at the current iterate. If is an initial guess for the 
true solution of (2.4), then letting 8 — 6^ + z, we have the equivalent system 

Jz = A°\ (2.5) 

where = — F — J8^ is the initial residual. For a general N X N matrix A and vector v, define 
the Krylov subpsace K(A,v,m) by 

K(A, v, m) = span{v,Av, 


Let K m denote 

K m = K(J,r(°\m). 

Amoldi’s method and GMRES both find an approximate solution 

*("•> = $(°) + z< m ), with z< m ) G K m , 


such that either 

(-F - JS^) ± K m (equivalently (r^ - Jz^) ± K m ) (2.6) 

for Amoldi’s method, or 

Ilf 1 + J^ m) || 2 = min ||F + J£|| 2 (= ||r( 0) - Jz|| 2 ) (2.7) 

for GMRES. Note that this condition is equivalent to demanding that the residual = —F - 
be orthogonal to JK m . Here, || • || 2 denotes the Euclidean norm on and orthogonality is 
meant in the usual Euclidean sense. 

The following algorithm is a nonlinear version of the Amoldi (GMRES) algorithm, which at 
every outer iteration generates an orthonormal system of vectors Vj (t = 1, 2, • • • , m) of the subspace 
K m and then builds the vector that satisfies (2.6) (or (2.7) for GMRES). In both algorithms, 
vi is obtained by normalizing r(°). 
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Algorithm : Newt on- Arnold! (Newton-GMRES) 

1 . Start: Choose uo and compute F(uo). Set n = 0. Choose a tolerance Co- 

2. Amoldi process: 

• For an initial guess form r(°) = —F — J8^°\ where F = i ? (« n ) and J — J(ti n ). 

• Compute /3 = ||r (°)||2 and ui = 

• For j = 1, 2, • • • , do: 

(a) Form Jvj and orthogonalize it against the previous t/i, • ■ • , Vj via 

hi t j — ( Jvj , Vj)j * = 1) 2j ■ • •, j, 

Vj+I = Jvj -^2 hijVi (2.8) 

»=1 

hj+ij = ll®j+ill*i 

v j + 1 = 

(b) Compute the residual norm Pj = |.F - 1 - of the solution 8^ that would be 

obtained if we stopped at this step. 

(c) Ifpi<€ n set m — j and go to (3). 

3. Form the approximate solution: 

Arnold!: Define H m to be the m x m (Hessenberg) matrix whose (possibly) nonzero entries 
are the coefficients hij, 1 < i < j, 1 < j < m and define V m = [vi, ' • ■ > ®m]» 

• Find the vector y m which solves the linear system H m y = /?ei, where e\ = [1, 0, • • • , 0] T . 

• Compute 4 - z( m \ where z( m ) = V m ymi and itn+i = u n + 8^ m \ 

GMRES: Define H m to be the (m + 1 ) x m (Hessenberg) matrix whose nonzero entries are 
the coefficients h^, 1 < i < j + 1, 1 < j < m and define V m = [t>i, v 2 , - w m ]. 

• Find the vector y m which minimizes ||/3ei - H m y\\ 2 , where e\ — [1, 0, . . ., 0] T , over all 
vectors y in R m . 

• Compute 4 . z( m ) where z( m ) = V m y m , and Un+i = «n + 

4. Stopping test: If u n+ i is determined to be a good enough approximation to a root of (1.1), 
then stop, else set u„ <— Un+i , n ♦— n + 1 , choose a new tolerance e n , and go to ( 2 ). 


Therefore, in both Amoldi and GMRES the outer iteration is of the form itn+i = u n + £(”*) 
where 8 + z^ m \ with 

* (TO) = V m y m , 

and y m is either the solution of an m X m linear system, for Amoldi, or the solution ofan(m+l)xm 
least squares problem for GMRES. 
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For simplicity, we have omitted several details of the practical implementation of the above 
linpftr and nonlinear methods, which are discussed at length in [5], [24], [4] and [25]. For example, 
the residual norm pj referred to in step 2 of the algorithms does not require the computation of 
the approximate solution 6^ at every step. Instead an inexpensive formula, which evaluates pj, is 
updated at each step while the factorization of the Hessenberg matrix H m or H m is updated. 

An important observation that has been very useful in practice is that there is no need to 
explicitly compute the Jacobian matrix J ( u n ) . This is due to the fact that the above algorithm 
only requires the product of this Jacobian times a vector and this can be well approximated by the 


difference formula: 


J(u)v & 


F(u + <rv) - F(u) 
<r 


(2.9) 


In [1], Brown has given an analysis of the resulting inexact Newton/finite-difference Krylov algo- 
rithms when using (2.9) to approximate J(u)v. Sufficient conditions are given in [1] on the size 
of the (t's in the finite-difference versions of Amoldi and GMRES which guarantee the local con- 
vergence of the Newton-Krylov iteration. These results have been extended in [4] to include a 
finite-difference version of the Conjugate Gradient iteration. 

One final aspect worth noting is the ability to use restarting in the linear Krylov methods. 
Typically, a maximum value of m is dictated by storage considerations. If we let mmax be this 
value, then it is possible that m = nw in the Amoldi process, and yet p m is still greater than e n . 
In this case, one can set equal to tf( m ) and restart the Amoldi process, effectively restarting 
the Krylov method. The convergence of such a procedure is not always guaranteed, but the idea 
seems to work well in practice. We note that for lack of a better initial guess we use = 0 on the 
first (and possibly only) pass through the Amoldi process at each stage of the Newton iteration. 
It is only when restarting that will be nonzero. We will refer to the restarted algorithms as 
Amoldi(m) and GMRES(m), where m is the maximum subspace dimension. As will be seen below, 
it will also be important to choose the tolerance e n at each step of the Newton iteration. 
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3 Global convergence results for linesearch methods 

We will be concerned with the convergence properties of the inexact Newton algorithms outlined in 
the previous section when combined with global strategies. In this section we will analyze the global 
convergence of inexact Newton algorithms when combined with linesearch backtracking strategies. 
The results given below are independent of the particular inexact Newton method used. 

To begin, let f(u) = |||F(ti)|||. An easy calculation gives 

V/(u) = J(u) t F(u), 


where J(u) = F'(u), the Jacobian matrix of F evaluated at u. Typically, convergence of a sequence 
of iterates {tin} is studied in terms of the scalar sequence 



(3.10) 


where 

6 n = u n+ i - tin and V/„ = V/(u„). 

When limn-^oo e n = 0, the sequence tin will converge to a solution u, under fairly mild conditions. 
First, let us assume that the acute angle between S n and the gradient V/ n is bounded away from 
w/2, i.e., that at every step we have 

cos0(V/„,£ n ) > e > 0, (3-11) 


I T I 

i »‘l i» lla ')* ^ en i" rom definition of e n in (3.10), the gradient V/ n 
will converge to zero whenever e n converges to zero. 

We now recall the following two important results from Ortega and Rheinboldt [21], pp. 475-476. 

Theorem 3.1 Let f : -► R 6e continuously differentiable on a compact subset D 0 C R^ 

and suppose that {tin} C Do is any sequence which satisfies lim n _ loo V/(u n ) = 0. Then the set 
(l = {u£ D 0 |V/(u) = 0} of critical points of f is not empty and, 

lim [inf ||ti — iin||] = 0 (3.12) 

n— ^oo uen 

In particular , if il consists of a single point u* then lim^oo Un = u* and V/(tt*) = 0. 

Theorem 3.2 Let f : R n R be continuously differentiable on a compact subset Dq C R^ and 
suppose that the set ft = {u £ D 0 \Vf(u) = 0} of critical points of f in D 0 is finite . Let {u n } C D 0 
be any sequence for which lim^oo V/(iXn) = 0 and lim n _^ O0 (u n +i - tx^) — 0. Then u n converges to 
a certain it* in ft and V/(u*) = 0. 

Thus, we will often attempt to establish conditions under which e n as defined by (3.10) converges 
to zero and for which (3.11) holds. 

To guarantee that the current iterate will make progress towards the solution in one step of 
the algorithm we must know that the inexact Newton step 6 is a descent direction for / at the 
current approximation u . A descent direction p at u is one for which there exists a Aq > 0 such 
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that /(« + Ap) < f(u) for all A < A 0 . As is well-known, when / is differentiable this is equivalent 
to the condition that 

V/(u) r p < 0, 

where V/(u) = (|£(u), ■ • •, §^(«)) T - As noted above, V/(u) = J{u) T F(u), and so p is a descent 
direction for / at u if 

F(u) T J(u)p < 0. 

If 6 is an approximate solution of the Newton equations 

J6 = -jF, 

with F = .F('u) and J = «7(tx), then 

F t J 6 = -F t F - F T r, (3.13) 

where f = —F — J6 is the residual associated with J. Thus, J will be a descent direction for / at 
u whenever \F T r\ < F T F. In particular, if ||r|| 2 < \\F\\ 2 , then 6 is a descent direction. This result 
was also given in [5] and is restated in the following proposition. 

Proposition 3.3 A sufficient condition for p£R N to be a descent direction for f at u is that 

||F(u) + J{u)p\\ 2 < ||-F(u)|| 2 - (3-14) 

As was seen earlier, it is also important to be able to guarantee that the angle between the 
gradient of / and the step 6 n is bounded away from tt/ 2. The next lemma gives a lower bound for 
|e n | under an additional assumption on the step direction p. 

Lemma 3.4 Let F : R^ — > R^ be continuously differentiable on R N . Let u £ R N be given with 
F(u) 0 and J(u) = F'(u) nonsingular. Consider p £ R N satisfying 

||F(u) + J(u)p|| 2 < 77|in«)l| 2 , 

with rj £ (0, 1). Then 

(3 - 15) 

where M = cond 2 (J(u)) and f(u) = l.F(u) r .F(u). 

Proof: For notational convenience, let F = F(u), f = /(«), V/ = Vf(u), and J = J(u) = 
F'(u). Note that F 0 implies V/ ^ 0. Let r be the residual associated with p so that r = F + Jp. 
Then ||r|| 2 < t/||.F|| 2 and P = -J~ X (F - r). So, 

Vf T p = (J T F) T (-J~\F-r)) 

= -F t F + F T r. 

Hence, 

|V/ r p| I F t F - F*r\ 

\\p\U - l|/- 1 (J ? -»OII* 

F t F - \F T r\ 

- iij- i ( j p - 
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fj||F|| 2 implies |F r r| < v\\ F \\h which then & ves 
F t F - \F T r\ > (1 - r,)\\F\\l 


||J'- 1 (-F-r)|| a < 
< 


ll-T'll, • ||F||, + l|.r 1 r||, 

(1 + • Ilf II,. 


(3.16) 


|V/ r p| (1 - v)l (l-l)llflb 

INI, -(i + i,)||/-*||,.||f||, (1 + >?)I|J- 1 II, 

and as a result, using the fact that ||V/|| 2 = ||7 T F|| 2 < ||7|| 2 ||.F|| 2 , we get 

lV/ r Pl > (!-»?) 

IIV/IU - ||j>|| 2 - (l + » 1)M' 

where M = cond2(J). □ 

Condition (3.15) can also be recast as 

cosS(v/ ’ ! ’ ) - (TT^Jm' 

At every step of the inexact Newton method, we require that a condition of the form 

HFW + JWPnlU^T/nll^K)^ 

Vn < V < 1 


(3.17) 

(3.18) 

(3.19) 

(3.20) 

(3.21) 


holds, and if we assume that the condition numbers M n = cond 2 ( 7(1^)) are bounded from above 
by M, then (3.15) shows that 

cose(p„,V/( U „))>ii^. (3.22) 

This implies that a sufficient condition to guarantee both p n being a descent direction and the 
validity of relation (3.11) is that the residual condition (3.20)-(3.21) holds. 

A simple consequence of the above lemma which will be useful in the section on trust region 
techniques, is that the residual norm assumption (3.20)-(3.21) implies that the cosine of the angle 
between the gradient and the Krylov subspace is bounded from below. More specifically, 

Corollary 3.5 Let F : H N — > be continuously differentiable on R^\ Let u £ R* be given with 

F(u) 7^ 0 and J(u) = F f {u ) nonsingular. Consider the subspace K = span{V} where the columns 
of V form an orthonormal set of vectors, and assume that there exists one vector p in K satisfying 

||F(«) + 7(u)p|| 2 < J7 ||F(«)|| 2 , (3.23) 

with t 7 £ (0, 1). Then 

ll^/WII, > g^ lV/Mla (3.24) 

where M = cond 2 (J(u)) and /(u) = %F(u) T F(u). 
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Proof: Let p 0 = Vy 0 be a vector of K that satisfies (3.23). We then have from the lemma 


\Vf(ufVyo\ 


1 V 


;||V/(«)||„ 


(3.25) 


INIs ~ (1 + rj)M 

where we have used the fact that ||po II2 = = 1 1 S/o 1 1 2 - Using the Cauchy-Schwartz inequality, 


||^ r V/(u)|| 2 > 
which proves the result. □ 


> -L-L-iv/WU 

llvolb (1 + V)M 


(3.26) 


3.1 Convergence of inexact Newton sequences 

In this subsection we consider the case of linearly converging inexact Newton sequences. That is, 
the sequence {r) n } in the inexact Newton method is only required to satisfy rj n < T/ma* < t < 1. 
Superlinearly converging inexact Newton sequences will be examined in the following subsection. 

An important condition to guarantee global convergence, is the so-called a-condition in the 
Armijo and Goldstein principle [11, 21] wherein S n must satisfy 

/(tin + £n) < /(«n) + “V /(tin)\. (3.27) 

We can show the following remarkably simple result if we require that the direction S n solve the 
linear system J(u n )S = —F(u n ) with a certain accuracy. 

Theorem 3.6 Let f = |||.F||2 be given, where F is differentiable, and a, q two scalars such that 
0 < a < |, 0 < 77 < 1 . Assume that the iterates u„ are defined by u n+ i = u„ + where 6 n satisfies 
(3.27) and 

II F(u n ) + J(un)S n \\ 2 < »7||F(« n )|| 2 . (3.28) 

Then 

lim /(tin) = 0. 
n— too 

Proof: In this proof we let J n = J(u n ), F n = F(u n ). Prom the condition (3.27) and the 
expression for the gradient of / we get 

/(tin+l) < /(tin) + aV/Jtfn = /(«n) + <*F* J n 6 n (3.29) 

Writing J n S n = -F n + r n this gives 

/(Un+l) < /(tin) + <*Fn(-Fn + »*„) 

= (1 - 2a)/(u n ) + aF^r n 
< (1 - 2a)/(u n ) + a||f„lbl|rn|| 2 . 

From (3.28) we have ]|r n || 2 < q||F„|| 2 which yields the following inequality, 

/(u n+ i) < (1 - 2a)/(u n ) + 2arjf(u n ) = [(1 - 2a) + 2aq]/(tin) (3.30) 

Notice that the scalar in the brackets is a convex combination of 1 and q and is therefore always 
less than one under the conditions on a and rj. The result follows immediately. □ 
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Note that we have made virtually no assumption on the function F apart from differentiability, 
and so the result is very general. However, we cannot guarantee in general that one can indeed 
select a vector 8 n that satisfies condition (3.27) and (3.28) at the same time, but we do know that 
near a solution u» for which «7(u,) is nonsingular, a sufficiently good approximation to the Newton 
step will satisfy these two conditions simultaneously. 

A more explicit result extending the above theorem is now shown. For this next theorem we 
assume that a general backtracking strategy is used. This means that the next iterate is of the 
form Un + Ap n , where p n is any descent direction and A is selected by the procedure described 
below. In the procedure the two parameters 0^, 0max are such that 0 < 0mm < #m*x < 1, a typical 
choice being 0^ = 0^ = 1/2. The procedure requires another parameter e* > 0 which is used to 
essentially rescale the starting step in the process in order to prevent it from from being too small. 

Algorithm 3.1: General Backtracking Procedure 

1. Set A = max{l,e*^^jjj^}. 

2. If f{tin + Ap n ) < f{u n ) + aAV/(u n ) r p n , then set A n = A, and exit. Else: 

3. Choose A E [0mmA, 0 m axA]; set A A. Go to (2). 

As is shown next, the sequence is well defined in that under a mild condition on the gradient of 
/ the procedure will deliver a nonzero A n in a finite number of steps. Moreover, the resulting A n 
can be bounded from below. 


Lemma 3.7 Let f be differentiable and assume that its gradient is such that 


IIV/OO - V/(y)|| 2 < 7ll z - y|| 2 » for all x,y € R N . 


(3.31) 


Let a < 1 and p n be any descent direction. Then Algorithm. 3.1 will produce an iterate u n +i = 
+ \ n Pn in a finite number of backtracking steps and A„ satisfies the inequality 


AnllPnlh > - 


V/K) T Pn 

l|Pn|| 2 


min{e* , 


(l -a) 


^min} • 


(3.32) 


Proof: The subscript n is dropped from this proof. Using the mean value theorem we have the 
equality: 

f{u + A p) = f(u ) + A V/(u + OXpfp, (3.33) 

where 0 < 9 < 1. We rewrite the above equation as 


f(u + \p) = f(u) + \Vf(u) T p + \[Vf(u + e\p) T p-Vf{u) T p] 

= f{u) + aW f(u) T p + A[(l - a)Vf(u) T p + (V/(u + OXpfp - Vf(u) T p)] 

= f(u) + aXVf(u) T p + A[(l - at)V/(u) T p + A||p n || 2 C], (3.34) 


where for convenience we have set 


V/(u + OXp^jp - V/(u) r p 
A||p|| 2 
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Note that from the assumptions we have 


|C| = |(V/(u + 0A P )-V/(u)) 2 


AlWIi 


< 7%l|2 < 7|[p||a 


(3.35) 


If the test in step 2 is passed at the first step, then the first A is accepted and, in this situation, 
the inequality (3.32) is trivially satisfied. If the test in step 2 fails for the first step, then A is 
reduced according to the rule in step 3. Moreover, after a finite number of reductions, the term 
in brackets in the right-hand- side of (3.34) will become negative and the corresponding A will be 
accepted. This will occur as soon as A^y ||p||2 < —(1 — a)V/(u) r p. The first A which is accepted 
will be such that _ 

\|i if > O (l-a)V/(») T r 

A||p||j> L 7 ||p|| 3 ’ 

and the inequality (3.32) is again satisfied. □ 

We note that the usual /3 condition of Armijo and Goldstein also guarantees that a lower bound 
on A similar to (3.32) is satisfied. Indeed, the relation (3.34) is still valid with a replaced by /3. 
Moreover, the /3 condition: 

f(u + A p) > f{u) + /3AV/(u) r p 

implies that 

(1 - p)Vf(u n ) T Pn -I- A n ||p„|| 2 C > 0. 

With the inequality (3.35) this immediately yields 


AnllPnlh > “ 


(1 -^)V/(lfn) r Pn 
7l|Pn||2 


(3.36) 


Both (3.32) and (3.36) imply that the step length from u„ is bounded from below with respect to 
V/(ti„) T Pn/|bn||2. 

We emphasize the importance of the initial A in the procedure. There is no reason why one 
should always start the process with A = 1 since ||p n ||2 can be arbitrarily small. As was explained 
before the choice of the initial A in step 1 is essentially equivalent to a rescaling of the vector p n . 
If we always start with A = 1 and p n happens to be very small at every step then the test in step 
2 may be passed immediately and there is a danger that fi n becomes too small for the iterates to 
make any progress towards the solution. 

Note, however, that if p n solves the linear system Jp = — F approximately, then we may have 
additional information that will ensure that ||p n ||2 is bounded from below. Indeed, the following 
lemma shown by Walker, [28] is just one such result. 


Lemma 3.8 Suppose that J(u)p = — F(u) + r and [|r [(2 < t]\\F(u)\\ 2 , with 0 < 77 < 1. Then , 


ibib > 


(1 - rj) V/(u) r p 

IIWII1 Iblb ‘ 


(3.37) 


Proof: We have 


|V/(u) T p| = |F(u) r J(u)p| < ||i ;, (u)||2||J r («)||2|bll2- (3.38) 
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(3.39) 


Moreover, from ||F(u)||2 = ||r - J(u)p||2 < »/||-F , («)||2 + ||^(^)l|2||p||2 we get 


m«)h < 


MHJk 

(1-7) 


IIpIIs- 


The result follows from combining (3.38) and (3.39). □ 

A consequence of the above lemma is that the backtracking procedure will always start with 
A = 1 in the first step if e* is small enough, or to be more accurate as long as 


e* < 


WWW V 


(3.40) 


This may provide for a rationed way of choosing e* since ||J(u)||2 may often be roughly estimated 
in the course of the algorithm. 

We can now prove the following theorem. 


Theorem 3.9 Let f =\\\F\\\ satisfy the conditions of the previous Lemma and let p n be such that 
||F n + JnPn 1 1 2 < i?||Fn||> f OT n > *7 < 1 * Further, let each iterate be chosen by Algorithm 3.1. 
Then, either 

lim /(u„) = 0 (3-41) 

n— too 

or 

lim ||p n || 2 = oo. (3-42) 

n — too 

Proof: Letting as before r = F(u n ) + J(u n )p ny and dropping the subscript n we have 

V/ r p = F t (-F + r) < — ||F||1(1 -V) = -2(1 - Tl)f, (3.43) 

and as a result, (with u„ +1 = u„ + A n p n = u + Ap) 

/(un + i) < /(«) + A aVf T p < f(u ) - 2Aa(l - q)/(u) = /(«)[! - 2Aa(l - q)]. (3-44) 


From the result of the Lemma we have 


- A||p||, < K 


Vf T p 


where we define 


and therefore, (3.44) becomes 


IblU 

k = min{e* , — -#mm} 

7 


(3.45) 


/( «n+l) < /(“) 


1 + 2 <*k( 1 - »7)ior 


(3.46) 


Denoting by t n the quantity V/J p n / !|Pn 111 and by c the constant 2a/c(l — tj) this relation can be 
rewritten as 

/(Un+l)</K)[l + C<n]. (3-47) 
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Since f{tin) is bounded from below and nonincreasing, it converges to a certain limit <f>. If this limit 
is zero the result of the theorem holds. If it is different from zero then by dividing both members 
of equation (3.47) by /(tin), we see that 1 + ct n which is bounded from above by 1 and from below 
by a sequence converging to 1, has 1 as its limit. Equivalently, t n converges to 0. Going back to 
the relation (3.43) which we rewrite as 2/(u n )(l - rj) < |f„| • ||p„|||, we see immediately that in this 
situation we must have ||p n [|2 — *■ oo. □ 

We mention that Eisenstat and Walker [12] have recently established an extension to this 
result. More precisely, they show that in addition to the conclusion of the above theorem, one of 
the following holds: 

(i) lim^oo ||tin|| 2 = oo 

(ii) The sequence Un has finite limit-points, and F' is singular at each of them. 

(iii) The sequence has a limit point u, such that F(u») = 0. 

We can show a result that is more explicit than that of Theorem 3.9 if we make a few additional 
assumptions on /(«„). 

Corollary 3.10 Let f = satisfy the conditions of the previous Lemma and let p n be such 

that ||.F n + J ( ti n )p„ [ 1 2 < qlli^llz for each n, with tj < 1. Further, let each iterate be chosen by 
Algorithm 3.1, and assume that J(u„) _1 exists and its norm is bounded from above for all n. Then 

lim f(u n ) = 0 (3.48) 

n — ► oo 

Proof: From the relation (3.16), and the fact that J(u„) -1 is bounded from above, the norm of 
the vector p n = J(u n ) -1 (.F(u n ) — r) is bounded from above. Therefore, from the previous theorem, 
we must have lim n _ >00 /(««) = 0. □ 

The following additional results do not require the use of the backtracking procedure described 
in Algorithm 3.1. They are based upon the ideas presented by Dennis and Schnabel [11]. Given the 
current Newton iterate u = Un and a descent direction p , we want to take a step in the direction 
of p that yields an acceptable tin+i* We will define a step 6 = Ap to be acceptable if both of the 
Goldstein- Armijo [11] conditions are met, namely 

f(u + Ap) < f(u) + a\ V/(u) r p, (3.49) 

and 

/(« + Xp) > /( tt) + /3AV/(u) r p, (3.50) 

for given scalars a, J3 satisfying 0 < a < j3 < 1. Again, we will refer to these two conditions as the 
a- find (3- conditions, respectively. For a given descent direction p, the next result shows that there 
exist points u + Xp satisfying (3.49) and (3.50). 

Theorem 3.11 Let f : — » R be continuously differentiable on with f(z) > 0 for all 

z 6 R^. Let u,p € H N be such that V f(u) T p < 0. Then given 0 < a < /? < 1, there exist 
A u > A/ > 0 such that u + Ap satisfies (3.49) and (3.50) for any A 6 (A/, A„). 

This is essentially Theorem 6.3.2, page 120, in Dennis and Schnabel [11], and so the proof is omitted. 
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Theorem 3.12 Let f : R N -*• R be continuously differentiable on R N with f(z) > 0 for all 
z 6 B. N , and assume there exists a constant 7 > 0 such that 

||V/(z)-V/(u)|| 2 <7||z-«I|2 (3.51) 

for every u,zE R N . Then given any uq E R n , there exists a sequence {u n } (n = 0, 1, • • •) satisfying 
conditions (3.49) and (3.50), and either 

V/Kftfn < 0 


or 

V/(u„) = 0 and S n = 0, 

for each n > 0, where S n = u n+ i - u n . Furthermore, for any such sequence, either 


(•) 

(b) 


V/(u n ) = 0 for some n > 0, or 


lim 

7l~ +OO 


V/(tt n ) r ^ n 

Wnh 


= 0. 


Proof: This is essentially Theorem 6.3.3 in [11] (p. 122), except that condition (3.50) is slightly 
different and / is assumed to be bounded from below. For each n, if V/(u n ) = 0, then (a) holds 
and the sequence is constant from then on. If V/(u n ) 5^ 0, then there exists a p n such that 
Vf{u n ) T p n < 0 (e.g., take p n = -V/(u„)). By Theorem 3*11, there exists A n > 0 such that 
=Un + \ n p n satisfies (3.49) and (3.50). Let S n = X n p n . We must now show that if no term 
of {£„} is zero, then (b) must hold. 

First, define u/„ = [ ] 1 1 2 and 

_ Vf{u n ) T 6 n 

<T n — 

By (3.49) and UiOi < 0 for every *, we have for any j > 0, 


3- 1 

/(uj) - /( u o) = X /(“"+ 1 ) " /(“") 

71=0 

< Y^aVf{uif6i 
i=0 

j - 1 

= a < 0 . 

;=o 


Hence, / > 0 on implies that the series 


oo 

< OO. 

i=0 


Thus, uf n a n 0 as n -mx). To conclude that a n -> 0 we must use condition (3.50) whose purpose 
was to guarantee that the steps do not get too small. 

By the Mean Value Theorem, there exists a A £ (0, A n ) such that 

/(«„+ 1 ) = /(«n) + v/(u n + Ap n ) r (ti „ + 1 - ti„) (3.52) 
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which, using condition (3.50), yields, 

/(«n+l) - /(tin) = V/K + Ap n ) r (Un +1 - «„) > /9V/K) T ^. (3.53) 

This implies that 

[V/(tin + A Pn ) - V/(Un)] r #n > (0 ~ l)V/(t S n > 0. 

Therefore, 

0 < (/3 — l)u)„<r„ < W„||V/(Un + Ap n ) - V/(u„)|| 2 

< 7A||Pn||2Wn < 7<*'n* 


So, 


> 


0-1 

7 


<fn > 0 


and 

0-1 2 n 
< “ < < 0 . 

7 

Hence, u> n a n -» 0 implies <r n — ► 0 as n oo. □ 

Note that cr n = Vf(u n ) T S n /w n -> 0 does not imply that V/(u„) 0 as n ->oo. However, 

it will as long as the angle between V f{u n ) and 8 n is bounded away from 90°. It is possible 
to guarantee this is the case in the inexact Newton setting as was shown by Lemma 3.4. Note 
also that V/ n — > 0 does not imply F( u n ) — *■ 0, without some additional assumptions, e.g., as in 
Corollary 3.10. 

If conclusion (b) holds in Theorem 3.12 with ^(un) + J(u n )£ n ||2 < J7||F(u n )|| 2 for all n, where 
tj G (0, 1) and if we assume that the condition numbers M n = condj( J(tin)) me uniformly bounded 
from above, then (3.15) shows that <r n -► 0 as n -► oo does imply V/(u n ) ->• 0. However, the 
conditions are too wesdc to imply that {u n } converges. 

We should also point point out that the conclusions of the above theorem hold for a sequence 
generated by Algorithm 3.1. 


3.2 Superlinear convergence of inexact Newton sequences 

In this subsection we will require that t] n — ► 0 as n — v oo. As noted in Theorem 2.1, given that 
the sequence of inexact Newton iterates converges, this additional assumption on the 7j n ’s implies 
that the convergence is at least superlinear. The main result of this subsection is a modification 
of a theorem obtained by Dennis and More [10], and shows that the global strategy based on the 
above a— and (3— conditions will permit full inexact Newton steps when close to a minimizer of /, 
provided that a < | and 0 > |. 

Theorem 3.13 Let F : 6e twice continuously differentiable in an open convex set 

D C R^, and for f = \F t F assume V 2 / G Lip y (D). Consider the sequence {u n } generated by 
u n+ i = u„ + A n p n> where ||F(u„) + J(tin)p„|| 2 < 7 n||f(«n )||2 for all n with 0 < 7? n < t? < 1 for all 
n, and A n chosen so that (3.49) and (3.50) hold with a < | and 0 > \. 

If tin — ► u, G D with J(u,) nonsingular , then F{u ,) = 0. If in addition, q n — > 0 as n — ► oo, 
then there exists an no >0 such that for all n > no, A n = 1 is admissible (i.e., satisfies conditions 
(3.49) and (3.50)). Furthermore , t/A n = 1 for all n > no, then u n — ► u* superlinearly. If also 
rj n = 0(||F(u n )|| 2 ), then the convergence is quadratic. 
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Proof: Since /(u) = |F(u) r F(u) > 0, and since ||.F n + «7nPn||2 < *?||-fn ||2 (where F n — F(u n ), 
J n = J(un), etc.) implies V/J p„ < 0 for all n, with 77 G (0, 1), we have by Theorem 3.12 that 


lim^f = 0 . 

n - > °° llPn||2 


(3.54) 


If Un — ► «* G D with J„ = J{ tt,) nonsingular, then by continuity M n = cond2(Jn) — 

cond 2 (./,), and so the sequence {M n } is uniformly bounded from above. Thus, the discussion after 
the proof of Lemma 3.4 implies that V/(u n ) — * 0 as n — *• 00 . Hence, again by continuity 

0 = V/(u.) = 

which gives F(u,) = 0. 

Next, we show that ||p n ||2 -+ 0 as n -» 00 . Since 

Pn = -Jn lp n + + Jnpn), 

we have 

< (1 + i)K’ ll> -IIF.il> 

Thus, ||F „||2 -* 0 implies ||p „||2 -* 0 as n — > 00 . Also, 

||V/„|| 2 = > ||Jl||,-' • ||F.||, = IIAUr 1 • ||F„|| 2 . 

Hence, by Lemma 3.4 we have 

-v/„V 1 - v 


M> “ (l + V)M r 


IWI, • l|F.|l„ 


where M n = cond 2 (J n )- Thus, 


11 p || ^ (1 "h V)M n 11 j || ^ fn Pn 

l|jy1, < , r~H / "ll > iip.ii, ■ 


and so 


\\F n \\ 2 ■ \\ Pn \\ 2 < ^ * (~V/Jp„) = ~^fnPn, 

1-T) 

letting a n = ~^M n • 1 1 */n 1 1 2 - In addition, it immediately follows from (3.55) that 


tlPnlll < ^i±^M“(-v/Jp„) = 
1-77 


(3.55) 


(3.56) 


(3.57) 


letting b„ = 

Using (3.56) and (3.57), and the fact that ||p n ||2 0, we next show that A n = 1 satisfies 

conditions (3.49) and (3.50) for n large. First note that if F = (jFi, • • •,i'V) r , then 


N 


V 2 /(u) = J(u) r J(u) + X) i! i(“) v2 ^( u ) 

i= 1 

= J(u) T J(u) + S(u). 
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Also note that ||5 , (tt n )|| 2 ->0 asn->oo since and F(u.) = 0. For each n, by the mean 

value theorem there exists a u„ on the line segment between and u* + p n such that 

/(«n + Pn) - f(Un) ~ ^ V/(«n) T pn = ^ (V/„ + V 2 /(«n)p„) Pn- 

This then gives 

/(«n + Pn) - /(«n) ~ ^V/(u n ) r p„| = | |(V/ n + V 2 /(«n)p n ) p n 

= \ (v/ n + V 2 / n p n ) T p„ + ip£ (v 2 /^) - V 2 /„)p„ 

< \ (lkJ(F n + JnPn ) 1 1 2 ' ||Pn|| 2 + (\\S n \\ 2 + T l|Pn|| 2 ) | |l>« 1 1 1) 

< \ {VnPnh • ||F n || 2 • \\pnh + (||Sn|| 2 + 7 l|Pn|| 2 )||p«||f) 

< -I( arl77n ||J n || 2 + 6„(||S n || 2 +7|lPn|| 2 ))V/Jp„ 

1 rp 

= -^nVfnPn, 

where we have used (3.56) and (3.57), and defined e„ = a n q n ||J„||2 + &n(||‘Sn|| 2 +7||pn|| 2 )- Therefore, 

^(1 + e„)V/Jp n < f{u n + p„) - f n < ^(1 - «n)V/Jp„. 

Next, note that since a n , b n and ||J n |[ 2 are all bounded from above, and since q„, ||$ n || 2 and ||p n || 2 
all converge to 0 as n — ► oo, we have that e n — ► 0. So, choose no > 0 so that for all n > no we have 

e n < min{l — 2a, 2/3 — 1}. 

It then follows that for all n > no 

PVfZPn < f{u n + Pn) ~ fn < aV/JPn- 

Thus, A n = 1 is admissible for all n > n 0 .The superlinear (quadratic) convergence of the sequence 
follows from Corollary 3.5 in [9] or Theorem 2.1 above. □ 

One cam relax the condition that q n — *■ 0 in the above theorem somewhat, although the resulting 
condition on q is not computationally feasible in general. 

Corollary 3.14 Let F : R^ — » and let f = \F T F be twice continuously differentiable in an 

open convex set D C R^ with V 2 / 6 Lip^(D). Assume that M = sup ue jj{cond2(J(u))} < oo and 
that K = sup ueJD {||J(u)|| 2 } < oo. Consider the sequence {u n } generated by = u„ + A n p„, 
where ||F(u„) + J(u n )p n || 2 < q||F(u n )||2 for all n with 0 < q < 1 for all n, and A n chosen so that 
(3.49) and (3.50) hold with a < 5 and /3 > 5. 

Ifun—yu+£D with J(u *) nonsingular , then F(u*) = 0. If in addition, rj satisfies 

• M • K < min{l — 2a, 2/3 — 1}, (3.58) 

1 - q 

then there exists an n 0 > 0 such that for all n > n 0 , A n = 1 is admissible (i.e., satisfies conditions 
(3.49) and (3.50)). Furthermore, if A n = 1 for all n> no, then Un-m, linearly. 
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Proof; From the proof of Theorem 3.13, the corollary will be true if we have 

On» 7 ||^n ||2 < min{l - 2a, 2/3 - 1}, 

for all n. But this follows immediately from the definitions of M and K , and condition (3.58). □ 
One may wonder whether or not the conditions required in the theorem are too strong if we 
want to ensure that A n = 1 is admissible. More precisely, does the weaker condition ||F , „ + J n Pn ||2 < 
q||F n || 2 for all n, where q G (0, 1), allow the existence of sequences u„ + i = u n + A n p n converging 
to a u. for which A„ = 1 is admissible for all large n. The answer is no as is illustrated in the 
following example. 

Example: Consider the one- dimensional function F(u) = u 6 R. Choose q G (0, 1) and 
0<a<!</3<lso that 

max{l — 2a, 2/3 — 1} < q. 

Choose 6 so that 0 < 6 < 1 and 


max{l — 2a, 2/0 — 1} < 1 — 0 < q. 


Consider a sequence (u n ) generated by u n+ i = tin + A n p„, where uo = 1, p n — -0u*i> and 
A n = (2 - a - (3)/6 for all n. Then f(u) = |u 2 with V/(u„) r p n = -6u 2 n , and so p n is a 
descent direction for all n. Also, ||jF n -f J n p n ||2 = |(1 — 0)«n| < , ?ll-f , ( u n )||2 = vWn] for all n • 
We next show that A n is admissble for all n . We have 

/(«„ + Ap„) = ^(u n + A n p n ) 2 = ^u 2 n - \0u 2 n + -\ 2 9 2 u 2 n . 

Next, 

/(«n) + A aV/(tin) r p„ = ^u 2 n + Aa(-0u 2 ), 

and 

/(«n) + A /3V/K) r p„ = -u 2 n + m-9u 2 n ). 

Thus, A is admissible if 


+ A 0(-6u 2 n ) < Ju 2 - \du 2 n + Ja 2 6 2 u 2 n < \u 2 n + Aa(-0u 2 ), 


2(1-0) . 2(1 -a) 

8 <A< 8 ' 

Clearly, the A « defined above is admissble for all n. However, for the parameters given above 
we have 1 < and so A = 1 can never satisfy conditions (3.49) and (3.50). Notice that 

u n +! = (a + (3 — 1 )u n , and so = (a + (3 — l) n uo with |a + 0 — 1| < 1. Hence, tin — ► 0 as 
Ti — ► 0 linearly. 


Note that the convergence results of this section are similar to others in the literature. However 
the emphasis was put on the additional residual norm condition (3.28). As was seen, slightly 
different, and somewhat stronger, results can be shown in the situation where S n satisfies a residual 
norm condition. 
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For the purpose of illustration, we end this section by describing the particular backtracking 
algorithm we have used in [5]. The selection procedure for A is modelled after that in [11]. 

Algorithm 3.2: Backtracking Procedure #2 

1. Choose a 6 (0, |) and /3 6 (j, 1). 

2. Given Un the current Newton iterate, find in some unspecified manner p with \\F n + J n p \\2 < 

U-fnlU- 

3. Find an acceptable new iterate u n+ i = u n + A p. First, set A = 1. Define u(A) = -f Ap. 

a. If u(A) satisfies (3.49) and (3.50), then exit. If not, then continue. 

b. If u(A) satisfies (3.49), but not (3.50), and A > 1, set A <— 2A and go to (a). 

c. If tt(A) satisfies (3.49) only and A < 1, or u(A) does not satisfy (3.49) and A > 1, then 

c.l If A < 1, define A io = A and A w = last previously attempted value of A. If A > 1, 
define A/ 0 = last previously attempted value of A and A hi = A. In both cases, u(Aj 0 ) 
satisfies (3.49) but not (3.50), u(Aw) does not satisfy (3.49), and A/ 0 < A«. 

c. 2 Find A 6 (A/ 0 ,A W ) such that u(A) satisfies (3.49) and (3.50) using successive linear 

interpolation. 

d. Otherwise (u(A) does not satisfy (3.49) and A < 1), decrease A by a factor between 0.1 
and 0.5 as follows: 

d. l Select the new A such that u(A) is the minimizer of the one dimensional quadratic 

interpolant passing through /(«n), f(«n) = V/(u n ) r p and f(tin + Ap). Then take 
the maximum of this new A and 0.1 as the actual value used. (One can show 
theoretically that the new A value so chosen will be less than or equal to one-half 
the previous value.) 
d.2 Go to step (b). 
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4 Global convergence results for model trust region techniques 

In [5], we presented a trust region algorithm based on a Newton-GMR.ES iteration. We give here 
a convergence theory for this algorithm and other methods based on projection principles. We 
start by describing in Section 4.1 general trust region methods and give some background on their 
theory. Our basic approach will be modelled after the work of Schultz, Schnabel and Byrd [26]. In 
Section 4.2 we will adapt this theory to the particular case that is of interest to us, namely the case 
in which a projection method onto a lower dimensional subspace is used. 

4.1 General trust region techniques for nonlinear optimization 

The model trust region algorithm generates a sequence of points u„, and at the nth stage of the 
iteration a quadratic model of / : — > R near the current iterate Un is used which has the form 

V’n(w) = fn + 9n W + ^W T B n U >, 

where f n = /(tin), 9n = V/(u n ), and B n « V 2 f(u n ). (A projected analogue of the above function 
based on approximations from the subspace K will be developed later.) At this stage an initial 
value for the trust region size r n is also available. An inner iteration is then performed which 
consists of using the current trust region size r n and the information contained in the quadratic 
model to compute a step 

Pn{ T n) = ^n)* 

Then a comparison of the actual reduction of the objective function 

ared^Tn) = f n - /(% + p n (^n)) 

and the reduction predicted by the quadratic model 

Pred n(r n ) = f n - ^ n{Pn{r n )), 

is performed. If there is a satisfactory reduction, then the step can be taken, or a possibly larger 
trust region used. If not, then the trust region is reduced and the inner iteration is repeated. 

For now we leave unspecified what algorithm is used to form the step computing function 
j>( 0 ,I?,t), and how the trust region size or radius r n is changed. We also leave unspecified the 
selection of B n except to restrict it to be symmetric positive definite. Details on these options 
will be given later. Shultz, Schnabel and Byrd [26] describe an abstract trust region algorithm as 
follows: 


Algorithm 4,1: General Trust Region Algorithm 

1. Choose 71, ai, at2 £ (0, 1), U\ £ R^, T\ > 0, and let n = 1- 

2. Compute f n — /(u n ), g n = g(tin) = V/(u n ), and B n £ H NxN symmetric and positive 
definite. 

3. Find r n and compute p n = p n ( r n) satisfying: [|p n ||2 ^ T n and 
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« piajS} £ «* “ d 

(b) either r n > r n _i, or 

• T n > ||.B“2i0n-l||2, or 

. for some r < r n / 7l , < «2 or < a2 ' 

4. Let u„+i = Un+Pn and n = n + 1. 


5. Go to Step 2. 

The conditions that the step selection function p{g, B,t) must satisfy will now be considered. 
Defining 

pred(< 7 , B, r) = -g T p{g, B, r) - -p(g, B, r) T Bp(g, B, r ), 
the conditions that we consider are: 

Condition 1 There exist two scalars c x , o\ > 0 such that for all g G for all symmetric positive 
definite B G B, NxN , and for all r > 0, pred (g,B,r) > ci||0||2nrin{T,<7i||<7||2/[|2?||2}. 

Condition 2 If B is symmetric positive definite and ||l? -1 y ||2 < t, then p(g,B,r) = —B~ 1 g. 

The first condition requires that the predicted decrease be at least as large as a given multiple 
of the TnrnimiiTi decrease that would be provided by a quadratic search along the steepest descent 
direction. The second condition forces the direction p to be equal to the Newton direction whenever 
the next point u n -f p lies in the trust region. The following result is given in [26]. 

Theorem 4.1 Let f : TL N —yR.be twice continuously differentiable and bounded below, and let 
V 2 /(u) satisfy ||V 2 /(u)|| 2 < /3 X for all u G R^. Suppose that an algorithm satisfying the conditions 
of Algorithm 4-1 above is applied to f(u), starting from some «i G R^, generating a sequence {un}, 
n = 1, 2, • • •. Then 

(i) Ifp(g,B,r) satisfies Condition 1 with ||R n ||2 < /?2 for all n, then g n — > 0. 

(ii) If p(g,B,r) satisfies Conditions 1 and 2, B n = V 2 /(it„) for all n, V 2 /(u) is Lipschitz 
continuous with constant L, and u* is a limit point of {u n } with V 2 /(u.) positive definite, 
then tin converges to u* q-quadratically. 

The first result in the theorem gives the first order stationary point convergence of the sequence 
of iterates while the first and second results taken together give second order stationary point 
convergence. 

We next discuss the procedures normally used to form the step computing function p(g,B,r). 
One way in which this can be done is to take p(g, B, r) to be the solution of the minimization 
problem 

min ip(w), where ip(w) = f + g T w + -w T Bw. 

IM|a< T 1 

Assuming B is symmetric and positive definite, the solution to this problem is given by 

-/'!_/ ~( B + with IK - 8 + W) -1 ffll 2 = r when pr ',112 > r, and 

“ [ -B~ 1 g, when ||R _1 ff||2 < r. 
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We note that the p in the first part of the definition of p is unique. It is well-known that for 
p(g, B, t ) = p(r) the following inequality is true: 

f~4 > i||</|| 2 min{r,||^}. 

For a proof of this result see [17]. With this ideal choice of the direction p(g, B, r), Condition 1 is 
trivially satisfied. 

Since there is no finite method of determining p such that ||(I? + pl)~ 1 g || 2 = t when r < 
|| 5 _1 5 || 2 , frequently a piecewise linear approximation to p(p) is used. The dogleg strategy of Powell 
[22] is an example of such a procedure. (See [11] for a discussion of this and other dogleg strategies.) 
If we denote Powell’s dogleg solution by p(r), then it is also well-known that for p(g, B,t) = p(r ) 
the following lemma is true. 

Lemma 4.2 Let p(r) be the dogleg solution to the minimization problem 

min (4.59) 

where ip(w) = / + g T w + \w T Bw with B symmetric positive definite. Then 

/-*Wr))>i||j|| jm in{r,J!||}. (4.60) 

For a proof of this lemma see Powell [23]. Again, a consequence of the above lemma is that 
Condition 1 is trivially satisfied, and as a result of Theorem 4.1 global convergence will take place 
under the mild condition that j|l? n || 2 remains bounded. 

4.2 Application to projection methods for nonlinear equations 

In the context of nonlinear equations, one typically bases the global strategy on the related function 
f{u) = ^F(u) T F(u). Letting u be the current approximate solution, the local quadratic model now 
has the form 

4>(w) = f + g T w + -w t Bw, 

where / = f(u), g = V/(u) = J(u) T F(u ), and B approximates J(u) T J(u). Note that V 2 /(u) = 
J(u) T J(u) + Y^iLi -F’»(«)V 2 jFi(u), and so in general V 2 /(u) = J(u) T J(u) only when F(u) = 0. 
When using projection methods to solve the nonlinear system, the full quadratic model ip(w) is 
replaced by a quadratic model on a lower dimensional subspace K. Letting the columns of the 
N x m matrix V form an orthonormal basis for K (with m the dimension of K), we have 

<j>{y) = ip(Vy) 

= f + g T Vy + \(Vy) T BVy 

= f + (V T g) T y+\y T (V T BV)y. 
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Thus, V<£(0) = V T g. Note that since V has orthononnal columns, the matrix V T BV is symmetric 
positive definite whenever B is. In the current setting, we will take B = J{u) T J(u). If q(r ) is the 
dogleg solution to the minimization problem 


min <£(y), 
llvlb<r 


then Lemma 4.2 implies 


We then have 


}• 


f-<t>{q{r)) > \\\V T g\\ 2 Tmn{T, ^tbv\\ . 


/ - ^(g(r)) > ^||F r p|| 2 nun{r, 


(4.61) 


using the fact that \\V T BV\\ 2 < \\B\\ 2 . 

In order to be able to apply the results of Shultz, Schnabel and Byrd [26], we must convert the 
lower bound in the above inequality to one involving ||j|| 2 > and not ||F r < 7 || 2 . As indicated above, 
we have / = \F t F, said so xp(w) has the form 


H w ) = f + ( J T F) T W + it v T (J T J)w, 


which gives g = J T F and B = J T J. Thus, we need a lower bound for \\V T g \\ 2 = \\V T J T F\\ 2 . Such 
a lower bound was derived in Section 3. Indeed, Corollary 3.5 states that 


(4.62) 


(4.63) 


\\v T Vfh = \\v T sh > 

provided that there exists at least one vector p in K such that 

nn«) + j(v)ph < •riifMii* 

Therefore, we immediately have the following lemma. 

Lemma 4.3 Let J £ H NxN be nonsingular and F 6 be given . Let 17 £ [0, 1) be chosen and let 
K be a subspace of dimension m in H N such that 

min||F + Jp || 2 < i?||F|| 2 . 

Choose the N x m matrix V so that its columns form an orthonormal basis for K . Let g(r ) be the 
dogleg solution to 

min 4>{y ) where <f>(y) = f + (V T g) T y + ~y T {V T BV)y, 
llvlh<T 1 

with f = \F t F, g = Vf = J t F and B = J 7 J . Then 


||F r ff ||2 > <?\\gh, 


(4.64) 


25 


and 


(4.65) 


/ - <Kq(r)) > i<r||g|| 2 mm{r,<r|j^}, 


where 


a = 


1 - V 

(1 + V)M' 


(4.66) 


Given this lemma it is now possible to state a model trust region algorithm appropriate for 
use with a general projection method. The algorithm is stated in terms of a sequence of general 
subspaces K n . 

Algorithm 4.2: Inexact Newton Trust Region Algorithm 

1. Choose an G (0, 1). 

2. Choose 71, ati, a 2 G (0, 1), G R^, T\ > 0, find let n = 1. 

3. Compute F n , J n and choose rj n G [0, r/max). Then choose a subspace K n C R^ satisfying 

• min^*-,, || F n + J n p \\ 2 < 

Let m n be the dimension of K n , and build V n G R 7Vxm " whose columns form an orthonormal 
basis for K n . 

4. Compute f n = \F?F n , Vjg n = Vjj%F n , and B n = V£jTj n V n with J n nonsingular. 

5. Find r„ and compute p n = V n q n = V n q n (r n ) satisfying: ||p n || 2 < r n and 

(a) and 
v ' pred n (r„) - 

(b) either r n > r n _!, or 

• T n > or 

• for some T S W7i. < “» OI < 

6. Let Un+i — u n + p n and n = n + 1. 

7. Go to Step 3. 

In the above algorithm, the step selection function q n {Vn 9n, B n ,r n ) is given by 

Qn{V n §m ?n) = 9n( T n) 

where g n (r n ) is the dogleg solution to the minimization problem 

min <j> n {y) with <f> n (y) = 4> n {V n y) = /„ + {V*g n ) T y + \y T B n y. 


Then by Lemma 4.3 we have 


fn ~ <M?n) > ^n||?n||2min{T n , }, 


(4.67) 
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where 


1 Vn 

(1 4" *?n)-^n 

with M„ = cond 2 (J n )- We now state the main result of this section. 

Theorem 4.4 Let F : H. N — » H N be twice continuously differentiable. Define f = | F T F and 
assume ||V 2 /(u)||2 < fix for all u E R^. Suppose that an algorithm satisfying the conditions of 
Algorithm 4-2 above is applied to f(u), starting from some uj E R^, generating a sequence {tin}, 
and assume that ||R n ||2 < 02 and cond 2 (J n ) < M for all n. Then 

(i) 9n -*■ 0. 

(*) // V 2 /(u) is Lipschitz continuous with constant L, and u* is a limit point of {u n } with J( u*) 
nonsingular, then F(u ,) = 0. If in addition, the sequence rj n — *■ 0 as n —* oo, then u n 
converges to u, superlinearly. Also, if T] n = 0(||F(ti n )||2), then the convergence is quadratic. 

This theorem is a direct adaptation of Theorem 4.1 above. The proof uses the previous lemma, 
and due to its length is deferred to an appendix. 

The actual dogleg algorithm we have used in [5] is modelled after that in [11], and is described 
below. The condition for accepting u n+ i is the a-condition in §3, namely 

/(«n + *) < /(tin) + aV/Kftf, 

where 6 = Vq and the columns of V form an orthonormal basis for the Krylov subspace K. The 
vector q will be that point on the dogleg curve for <f>(y) such that ||g|| 2 = r, where r is the current 
trust region radius. The algorithm is then as follows. 

Algorithm: Dogleg 


1. Choose a E (0, |). 

2. Given u„, the current Newton iterate, calculate 6 GM = V yGM- Here, yaM is calculated using 
the GMRES method (without restarting) with initial guess 6^ = 0, and it is assumed m is 
large enough so that ||F n + J n S GM || 2 < JfaH-Fnlh- 

3. Given r, the current trust region size, calculate q, the point on the dogleg curve for <f>(y) with 
||g|| 2 = r. Then calculate u„ + i = it n + Vq. If it n +i is acceptable, then go to step(5). 

4. If u n +i is not acceptable, then do one of the following: 

(a) If r has been doubled during this iteration, then set u„ + i equal to its last accepted value 
aind set r *— r/2. Then continue to the next Newton iteration. If not: 

(b) Determine a new r by using the minimizer of the one dimensional quadratic interpolating 
/(u„), /(u n +i), amd the directional derivative of / at Un in the direction 6 = Vq. Letting 
X be the value for which Un + A£ is this minimizer, set r <— A||£|| 2 , but constraining it 
to be between 0.1 amd 0.5 of the old t. Then go to Step 3. 

5. For an acceptable u n +i> calculate ared n (r) amd pred n (r). Then do one of the following: 
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(a) If aredn(r) and pred n (r) agree to within relative error 0.1, and r has not been decreased 
during this iteration, set r *- 2 * r, and go to step (3). If not: 

(b) If ared„(r) < 0.1 * pred n (r) set r = r/2, or if ared n (r) > 0.75 * pred n (r), set r = 2 * r. 
Otherwise, do not change t. Then continue to the next Newton iteration. 

Note that the a-condition is equivalent to the condition in step 5(a) of Algorithm 4.2, and -that 
the conditions for decreasing the size of the trust region in 5(h) are met. For more details on 
permissible trust region updating strategies, see §3 in [26]. 
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5 Applications to Newton-Krylov methods 

In this section we show how the theory outlined in the previous sections applies to the Newton- 
Krylov methods discussed in §2. The details of the implementation of the Newton-Krylov algorithms 
when combined with either a linesearch backtracking strategy or model trust region are discussed 
at length in our earlier paper [5], and so we will not go into them here. As was seen above, the 
most important condition to guarantee convergence is that the residual norm be reduced by a 
certain amount. This was crucial in both the linesearch methods and the trust region methods. 
Unfortunately, as we indicate below, there is always the possibility of stagnation when using either 
linear Amoldi or GMRES, and as a result there remains the possibility of a breakdown in the 
nonlinear iteration. In these situations the basic residual condition (2.1) may not be satisfied using 
a single subspace. We begin this section by giving sufficient conditions under which stagnation of 
the linear iteration never occurs. 

The simplest condition is simply to ensure that the steepest descent direction — g = -J T F 
belongs to the subspace K. Then the minimum of \\F + Jp\\2 for p E K , is reduced from its value 
of ||-P||2 when p = 0 to an amount not less than that obtained by a steepest descent step, i.e., 

||F + J* < ||F + XJJ T F\\, < ~ \ ll-flb- (5-D 

As a result, if the steepest descent direction is known, or computable, it suffices to add it to the 
subspace to guarantee the existence of an 77 6 (0, 1) for which the residual condition is always 
satisfied. The resulting modification of the underlying algorithms is very simple. 

A particular case of the above situation is when uj = ±F/||F||2 and the Jacobian J is symmetric. 
Then the Krylov subspace K m will in fact contain the steepest descent direction for the function 
/ = \F T F for any m > 2. This is because 

V/ = J t F = JF = ±||F|| 2 J Vl 

which, apart from a scaling factor, is the second vector of the Krylov sequence. As a result, 
stagnation can be avoided if J is symmetric and a Krylov subspace of dimension m > 2 is used. 
Analogous arguments also guarantee that the steepest descent direction lies in K whenever J is 
skew- symme trie or, whenever J — I 4- a5, with 5 skew-symmetric, and a real. More generally, if 
there is a polynomial q of degree m such that J T = q( J) then again the steepest descent direction 
lies in K m . However, this is equivalent to the property that J is normal, see [13] for details. 

A third case for which there is no difficulty is when the Jacobian at every point is positive 
real, i.e., J + J T is symmetric positive definite. It has been shown by several authors that, in this 
situation, linear Krylov subspace methods will converge. Thus, if one required the Jacobian matrix 
J(u) to be positive definite for all u G R/^, then the residual conditions in the main results of 
the previous sections can be satisfied, and so convergence of the sequence of nonlinear iterates will 
follow. This will be the case for some problems, but it is clear that requiring J(u) to be positive 
definite everywhere is very restrictive. 

For problems whose coefficient matrices are indefinite, the convergence theory for the linear 
Amoldi and GMRES algorithms is lacking in one very important area at the present time in that 
it is impossible to predict how well either algorithm will perform on the problem 

Ax = 6. 
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Some partial answers to this question have been given by several authors. For example, in [2], 
several results are shown which basically show that either both algorithms will perform well on 
a particular problem or both will perform poorly. Some numerical studies performed by Huang 
and van der Vorst [15] suggest that the convex hull of the upper Hessenberg matrix H m in the 
Araoldi process must well approximate that of the matrix A in order for the GMRES solution to 
well approximate the true solution of the above linear system. 

There also are generalizations of the above results for the linear Krylov methods when the 
coefficient matrix is positive definite. We now show a result which gives necessary and sufficient 
conditions for the first k steps of the linear GMRES algorithm to stagnate. 

Theorem 5.1 Consider the solution of Ax — b using GMRES with initial guess x 0 and residual 
r 0 = b- Ax 0 . Let X{ be the i-th GMRES iterate fori- 1 , • ■ • , k. Assume A is nonsingular and let 
p be the minimal degree of A with respect to r 0 . Then for any k < p, we have x^ = x^-i = • • • = *o 
if and only if r^A'ro = 0 for i = 1, • ■ • , k. 

Proof: Let r j = 6 - Axj (j = 1, ■••,&) with k < p. Because j < k < p, the N X j matrix 
Zj = [r 0 , -Ar 0 , ..., j4 fc_1 r 0 ] is of full rank. It follows directly from the remark after (2.7) that Xj = 
xo + ZjVji where yj 6 R J solves the j x j system 

(AZj) T (r o - AZ iVi ) = 0. 


Since Zj is of full rank and A is nonsingular, the above system has a unique solution. Observe that 
Xj = xq if and only if this unique solution is yj = 0. This is true if and only if the right hand side 
( AZj) T r 0 is zero, i.e., if and only if A l r 0 = 0 for i = 1, • • • , j. The result follows immediately. □ 

Using this result, it follows immediately that if A k is either positive or negative reed, then 
GMRES(Jfe') converges for any k l > k. Although interesting theoretically, this result is of limited 
practical application. However, it and the ones referred to above indicate some of the subtleties 
involved in analyzing the convergence behavior of the linear Krylov methods. 

An important issue somewhat related to the question of stagnation is that of preconditioning. 
We mentioned above that adding the steepest descent direction to each subspace K guarantees 
convergence. In fact, using the normal equation approach, i.e., solving the linear systems by using 
the conjugate gradient algorithm on the normal equations, also guarantees convergence. However, 
the convergence can be very slow and this can be just as problematic as divergence. The usual way 
to improve convergence of the linear solvers is to precondition the systems. The key solution to 
avoiding stagnation is to use a good preconditioning technique. The importance of preconditioners 
over the Krylov subspace methods themselves has been illustrated in the tests in our previous paper 
[5]. There are a number of standard preconditioners that can be used when the Jacobian is explicitly 
available, the simplest and often the most inexpensive being the incomplete ILU factorization. 
Unfortunately when the Jacobian matrix is not available, i.e., in matrix-free methods, this is not 
feasible. Preconditionings that do not require the Jacobian explicitly but only its action on a given 
vector could be extremely useful. We should mention that any nonlinear fixed-point iteration can 
be considered as a matrix-free preconditioner [7]. We believe that much research remains to be 
done in this direction. For now, we can say that the best successes of nonlinear Krylov subspace 
methods have been in cases where the particular knowledge of the physics of the problem allows 
one to derive suitable preconditioners [29]. 


30 


6 Conclusion 


We have provided some theory for nonlinear projection methods with emphasis on those methods 
based on Krylov subspaces. The main results are similar to others in the literature, from which 
they have been adapted. 

One of the main restrictions of most of the schemes used is that the subspace onto which a 
given Newton step is projected must solve the Newton equations with a certan accuracy which is 
dictated by the residual condition (2.1). This, as we have shown, is enough to essentially guarantee 
convergence of the standard linesearch and trust region algorithms. On the practical side, the 
mAin difficulty is that one does not know in advance if the subspace chosen for projection will be 
good enough to guarantee this residual condition. Techniques which use restarting of the linear 
iteration can be very useful in this context. Moreover, preconditioning is essential in the successful 
application of these methods. 

Finally, there are generalizations of the Newton- Krylov methods considered in this paper which 
may be more effective on certain problems. For example, once a subspace K has been constructed 
along with an orthononnal basis F, one could consider solving the nonlinear least squares problem 

min f(u+ Vy ), 
yeR m 

where /(u) = \F(u) T F(u) and u is the current approximate solution. This would be in lieu of 
the quadratic models considered in the trust region algorithms. Preliminary testing of such an 
algorithm has been encouraging. We will consider this and other generalizations in future work. 


Acknowledgements. The authors are indebted to Homer Walker for the very valuable help he 
provided them by carefully reading the manuscript and pointing out a few errors. In particular the 
modification of the first step in Algorithm 3.1 was spurred by his remarks. Also, as was mentioned 
earlier, the result in Lemma 3.8 is due to Walker. 
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Appendix 


The proof below follows closely that of Theorem 4.1 given in [26]. The major differences arise 
from the fact that a lower dimensional quadratic model is used, rather than the full iV-dimensional 
model assumed in [26]. 

Proof of Theorem 4.4: By Taylor’s theorem, for any n and any r > 0, 


lared^r) - pred n (r)| = 


< 


fn ~ /(« n + Pn(T)) ~ (/n ~ fn ~ SnPn( T ) - ip n (r) r B n p n (r)^ | 

\pn{r) T B n p n {r) - f P„(r) T V 2 /(«n + £Pn(T))Pn(r) • (1 - {)<*£ 

lbn(r)||l f 1 II B n - V 2 /(«n + fr„(r))|| 2 (l - 0«. 

Jo 


So, 


aredn(r) 

predn(r) 


-1 


IMlM Jo II Bn ~ WK + ^n(r))|| 2 (l ~ M . 
|predn(r)| 


(A.1) 


Also, note that for any sequence {u n } generated by an algorithm satisfying the conditions of 
Algorit hm 4.2, the related sequence {/ n } is monotonicaUy decreasing and bounded from below. 
Hence, f n converges to an /* as n goes to infinity. This fact will be used in the remainder of the 
proof. 

Proof of (i): Since rj n < T] m ^ x < 1 for all n, we have 




1 f/max 

(1 H" 


= o > 0 for all n. 


;From (4.67) it follows that 

/.-Wt,(r))> ^|| S „|| ! mta{T,s|f*-}for i 01n. (A.2) 

2 \\JJn\\2 

Next, consider any k with ||jfc|| 2 ± 0. For any u, ||<7(u) -0*|| 2 < /?i||u- uitlh- So, if ||u-ujfe|| 2 < 
lbfc|| 2 /(2^i), then 

llsWIl! > Ml: - lls(«) - »l| 2 > ^ 

Let R = ||5fc|| 2 /2 and D R = {u: ||u - ti fc || 2 < £}. 

At this point there are two possibilities: either Un £ Dr for all n > fc or eventually {u n } leaves 
the ball. We show the latter is true by contradiction. Suppose tin € Dr for all n > k. Then for till 
n > k, 1 1 g n 1 1 2 > ||<7fc|| 2 /2(= e). Thus, by what was shown above, 

pred n (T) > ^|| 5n || 2 min{r,a|^j^} 

> ffcmin 
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for all n > ife, since \\B n \\ t < fa and ||y n || 2 > e implies ||0 n || 2 /||Bn||a > € /fa- To simplify notation 
let 6 = at. Then, using the above inequality we have 

1 arednCr) 


predjr) 


< jMr)! Jo \\B n ~ V a /K + fr»fr))||a(l ~ (W 

~ 8Tmn{T,6/fa} 

< t! (ft + ill 

~ 8 min{r, 8 /fa} 

^ rjfa + § 2 ) 


for all n > A: and r < 8/ fa. This gives for r sufficiently small and n> k that 

ared n (r) 


P r ed Jr) 


> OC2* 


In addition, we have 


11/TTi D 1/ II ■> Pnjfa|j« > -Jlffnlil > 

li (Vn £nFn) Fn 5,1,12 - \\VjB n V n \\ 2 ~ ||Bn|| 2 - /3a’ 


so that for r n sufficiently small none of the conditions allowing decrease of r n in 5(b) of Algorithm 4.2 
above can hold. It follows that r n is bounded away from 0. But, since 


fn ~ fn+ 1 = ared„(T n ) > aipred n (r n ) > ai#min{r n , — }, 


(A.3) 


and since / is bounded from below, we must have r n — » 0, which is a contradiction. Therefore, 
{«„} must eventually be outside Dr for some n > k. 

Let / + 1 be the first index after k with uj + i not in Dr. Then 

/(u fc ) - /(« f+ i) = E/K)-/K +1 ) 

n=k 

l 

> «iPred n (r n ) 

n=k 

i « 


> ^2 a i& min{r n , — }. 


V=k 


Now, if r n < 8 j fa for k < n < /, we have that 


/(u fe ) - /(tii+i) > Yl Tn - 

r\—k 

Otherwise, we have that /(ti*) — /(u; +1 ) > a\8 2 j fa. (That is, there exists at least one n with 
r n > 8/ fa.) In either case, 

f( u k)~f{ui+ 1) > ai5min{ii, —} 

. J\9kh __,_ r *\\9kh ^llflfclla, 

= min 

1 1 


> WSkWl*!* -min{— 


33 


By assumption, / is bounded below, and by construction /„ is monotonically decreasing. These 
imply that f n — ► /*. Then by the preceding inequality 


I MI < 


i 1 


ai " 


(/»-/.)■ 


Therefore, g n -* 0 as n -► oo. 

Proof of (it): By assumption, u, is a limi t point of {u n }. Let {tin, } be a subsequence' converging 
to u,. We show first that tin converges to u*. By (i), <j(u*) = 0 . Since J(u«) is nonsingular and 

0 = g(u,) = J(u,) r F(u,), it follows that F(u+) = 0 . We then have V 2 /(u*) = J(u,) r J(u,), and 
hence is positive definite. Since V 2 / is continuous, there exists a S\ > 0 such that if ||u — u«||2 < Si, 
then V 2 /(u) is positive definite, and if u ^ u* then ff(ti*) ^ 0 . Let Di = {u: ||u - u*||2 < tfi}. 

Since </(u,) = 0 , we can find 62 > 0 with 62 < £i /4 and ||V 2 /(u) _1 ||2 • ||ff(«)||2 < S\/2 for all 
u G I >2 = {« : ||« - «.||2 < $ 2 }- 

Find jo such that /(«n J0 ) < inf{/(u) : u G D\ — D2} and u„ Jo 6 f?2* Consider any uj with 

1 > rtj 0 , ui £ Z?2- We claim that u/+i € D 2 , which implies that the entire sequence beyond u nj# is 
in Z?2’ Suppose that uj+i is not in D 2. Since fi+\ < f n j 0 , t*i+i is not in D\ either. So, 

6 3 

n > ||« j+i - « i|| 2 > || m+i - «*||2 - ||«j - «.||2 > «i - j = ^1 

> | > ii^r'ii! ■ [ij(«oii! 


But, since the inexact -Newt on step is within the trust region, we have 


Pi(n) = -Of BiViY'v? «/(«,). 


Since ||p/(7})|| 2 < it follows that u*+i G D u which is a contradiction. Hence, for all n > n JO , 
Uji G -D25 and so since f(u n ) is a strictly decreasing sequence and v* is the unique minimizer of / 
in Z?2> we have that Un converges to u *. 

Next, we show that the convergence rate is superlinear. This is done by showing that eventually 
||(V^H n V P n )" 1 V^5n||2 will always be less than T n , and hence inexact-Newton step will always be 
taken. Since J(u*) is nonsingular, it follows from the results in [ 9 ] that the convergence rate of u n 
to u* is superlinear. 

To show that eventually the inexact-Newton step is always shorter than the trust radius, we 
need a particular lower bound on pred n (r). By the assumptions of (ii), for all n large enough, 
B n = V 2 /(u n ) is positive definite. Hence, either the inexact-Newton step is longer than the trust 
radius, or p n { T ) is the inexact-Newton step. In either case, 

lla.MII, < ll(v„ T B„v„)- 1 v n I V n || I < ||(v„ r B„r„)-'|| 2 ||v„ T j n ||2> 


and so it follows that ||F n r 5 n || 2 > ||Pn(T)||2/||(V,f £„V„) - 1 ||2. By what was shown in the proof of 
(i), for all n large enough we have 


P red n( T ) > 


1 JMt)|12 
2 HBn 1 ^ 


min{r, a 


JIMi 

lia.ll* 


} 


34 


. 1_„ „ . fM ( ,,, - IMr)|| 2 , 

> 1 9 ||Pn(r)|l| 

- 2 M*\\B;%' 


So, by the continuity of V 2 /, for all n large enough, 


IMr)||! 

pred n (T) > 7 1 


4 || V */( m )- 1 || 1 

Finally, by the argument leading up to (A.l) and Lipschitz continuity, 


|aredn(r) - pred n (r)| < ||p n (r)|||^. 


Thus, for any r > 0 and n large enough, 


ar edn(r) _ 
pred n (r) 


< 


< 


2X||VV(u.)-H ; ^( t) || ; 
2Xi| v*fM~ l b Tm 


Thus, by step 5 (b) of Algorithm 4 . 2 , there is a f such that if r n _i < f, then r n will be less 
than r n _i only if r n > |](V^l 1 S n _iF„_i)" 1 V r ^L 1 ff„_i||2- It follows from the superlinear con- 
vergence of the inexact-Newton method that for ti„_i close enough to u, and n large enough, 
\\(Vj B n V n )~ 1 Vj g n \\2 < H(F n r _ 1 5 n _ 1 F n _i)- 1 F n r 5n _i||2. Now, if r„ is bounded away from 0 for all 
large n, then we are done. Otherwise, if for an arbitrarily large n r n is reduced, i.e., r n < r n _x, 
then we have 

r n > ||(V n r _ 1 5 n _ 1 F n _ 1 )- 1 F n r _ l5n _ 1 ||2 > \\(VjB n V n )- l V?g n \\ 2 , 

and so the full inexact-Newton step is taken. Inductively, this occurs for all subsequence iterates 
and superlinear convergence follows. □ 
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